In [1]:
%matplotlib inline
import pandas as pd
from pandas.tools import plotting
import mia
In [2]:
raw = pd.DataFrame.from_csv('../results/2015-03-05-results.csv')
meta_data = raw[['patient_id', 'class', 'side', 'view']]
raw.index = raw.image_name
raw.describe()
Out[2]:
In [15]:
[g['radius'].describe() for i, g in raw.groupby('class')]
r = raw[raw['radius']>30]
mia.plotting.plot_risk_classes(r, 'radius')
Examine the raw intensity statistics for each image. There's a really high response in the lower range of the data. This is almost certainly down to a high average intensity in the smaller blobs. Again, as with shape based measures, the risk classes differ in the intensity of the number of responses, rather than shape of the distribution
In [78]:
intensity_columns = [c for c in raw.columns if 'intensity'in c]
intensity_stats = raw[intensity_columns]
intensity_stats.describe()
Out[78]:
In [87]:
intensity_stats['class'] = meta_data['class']
mia.plotting.plot_risk_classes_single(intensity_stats, 'avg_intensity')
In [80]:
features = mia.reduction.feature_statistics(raw)
meta_data = features[['class', 'patient_id', 'view', 'side']]
features.describe()
Out[80]:
In [81]:
intensity_columns = [c for c in features.columns if 'intensity'in c]
intensity_features = features[intensity_columns]
intensity_features.describe()
Out[81]:
Using radviz on the intensity features shows that the biggest contributors are (in order of importance):
In [82]:
intensity_norm = mia.analysis.normalize_data_frame(intensity_features)
intensity_norm.columns = intensity_columns
intensity_norm['class'] = meta_data['class']
In [83]:
columns = [c for c in intensity_norm.columns if c not in ['std_std_intensity', 'std_avg_intensity', 'avg_avg_intensity', 'avg_std_intensity']]
plotting.radviz(intensity_norm[columns], 'class')
Out[83]:
In [84]:
intensity_features = intensity_features.fillna(0)
mapping = mia.analysis.tSNE(intensity_features)
mapping['class'] = meta_data['class']
mia.plotting.plot_scatter_2d(mapping, [0,1], label_name='class')
Taking the most significant elements determined via radviz, the data set appears to form clear bands. Also, risk class 2 appears to be more clustered towards the end of the data.
In [85]:
sig_features = intensity_features[['std_std_intensity', 'std_avg_intensity', 'avg_avg_intensity', 'avg_std_intensity']]
sig_mapping = mia.analysis.tSNE(sig_features)
sig_mapping['class'] = meta_data['class']
mia.plotting.plot_scatter_2d(sig_mapping, [0,1], label_name='class')
Plot some the the features in 3D against one another to examine the relationship
In [86]:
intensity_features['class'] = meta_data['class']
columns = ['avg_std_intensity', 'std_avg_intensity', 'avg_avg_intensity']
mia.plotting.plot_scatter_3d(intensity_features, columns=columns, labels=meta_data['class'])